Split into two ranges to read messages if offsetInFileOfOldestMessage > offsetInFileAtEndOfNewestMessage to avoid possible loop#72
Conversation
This reverts commit 69881ef.
…fsetInFileAtEndOfNewestMessage to avoid possible loop
dfed
left a comment
There was a problem hiding this comment.
I like how you're thinking about this problem! I need to dig in more before I can approve, which I won't have time to do until next week, but I'm excited by this direction 😄. In the meantime, I've left some comments/questions.
| // There is only one range: | `offsetInFileOfOldestMessage` -> `offsetInFileAtEndOfNewestMessage`| | ||
| if reader.offsetInFileOfOldestMessage < reader.offsetInFileAtEndOfNewestMessage { | ||
| for encodedMessage in try reader.encodedMessagesFromOffset(reader.offsetInFileOfOldestMessage, | ||
| endOffset: reader.offsetInFileAtEndOfNewestMessage) { | ||
| messages.append(try decoder.decode(T.self, from: encodedMessage)) | ||
| } | ||
| } else { | ||
| // In this case, the messages could be split to two ranges | ||
| // | First Range | (GAP: ignore) | Second Range | | ||
|
|
||
| // This is second range: | `offsetInFileOfOldestMessage` -> EOF | | ||
| for encodedMessage in try reader.encodedMessagesFromOffset(reader.offsetInFileOfOldestMessage) { | ||
| messages.append(try decoder.decode(T.self, from: encodedMessage)) | ||
| } | ||
|
|
||
| // This is first range: | `expectedEndOfHeaderInFile` -> `offsetInFileAtEndOfNewestMessage`| | ||
| for encodedMessage in try reader.encodedMessagesFromOffset( | ||
| FileHeader.expectedEndOfHeaderInFile, | ||
| endOffset: reader.offsetInFileAtEndOfNewestMessage) { | ||
| messages.append(try decoder.decode(T.self, from: encodedMessage)) | ||
| } | ||
| } |
There was a problem hiding this comment.
This is a clever approach! I don't have time to dig to deeply into this PR this week, but I'll do my best to get to it next week. My initial impression is that this approach seems reasonable. At a minimum, I really like the code comment you've written here – they'll really help future maintainers (myself included!).
There was a problem hiding this comment.
I like how you're thinking about this problem! I need to dig in more before I can approve, which I won't have time to do until next week, but I'm excited by this direction 😄. In the meantime, I've left some comments/questions.
This approach came in when I read the discussion in PR#64 again. really happy to see this move forward if you feel it right too : D @dfed
| XCTAssertEqual(messages, []) | ||
| } | ||
|
|
||
| func test_messages_whenOffsetInFileAtEndOfNewestMessageIsBeyondEndOfNewestMessageButBeforeEndOfFile_throwsFileCorrupted() throws { |
There was a problem hiding this comment.
why is this test (and the deleted one below) no longer valid? I think testing these cases is still valuable, but I might be missing something.
There was a problem hiding this comment.
These tests should be kept. I reverted the last commit, this was deleted then.
| let message: TestableMessage = "This is a test" | ||
| let maximumBytes = try requiredByteCount(for: [message]) | ||
| func test_messages_throwsFileCorruptedWhenOffsetInFileAtEndOfNewsetMessageOutOfSync() throws { | ||
| let randomHighValue: UInt64 = 10_1000 |
There was a problem hiding this comment.
let's make this number less confusing (and fix an old mistake of mine) by doing either:
| let randomHighValue: UInt64 = 10_1000 | |
| let randomHighValue: UInt64 = 10_000 |
or:
| let randomHighValue: UInt64 = 10_1000 | |
| let randomHighValue: UInt64 = 101_000 |
Codecov Report
@@ Coverage Diff @@
## main #72 +/- ##
==========================================
+ Coverage 96.36% 96.44% +0.07%
==========================================
Files 14 14
Lines 578 591 +13
==========================================
+ Hits 557 570 +13
Misses 21 21
|
2c17337 to
f26e921
Compare
| break | ||
| } | ||
| } | ||
| if let endOffset, offsetInFile != endOffset { |
There was a problem hiding this comment.
If use maximumOffset set default value for endOffset in the func's parameter, this line need change to offsetInFile < endOffset.
If changed to this, we need to change some test cases, like test_messages_whenOffsetInFileAtEndOfNewestMessageIsBeyondEndOfFile_throwsFileCorrupted. That case won't get error when offsetInFileAtEndOfNewestMessage is beyond EOF.
|
I will dig into this next week as well. Thanks @jianjunwoo ! |
bachand
left a comment
There was a problem hiding this comment.
Great work @jianjunwoo !
My understanding of this PR is as follows... We aren't currently aware of any type of file corruption that could lead to an infinite loop. At the same time, the way that code is currently written means that an infinite loop remains possible. This PR refactors the library so that an infinite loop is not possible.
It's worth pointing out that so far we haven't (as far as I can recall) acknowledged the two different "modes" of CacheAdvance in the code. The code has been written to transparently handle both situations. There's something elegant with not explicitly acknowledging each case and writing code that will work in either. At the same time, I have found that I spend a lot of time each time I review changes to CacheReader.swift thinking about the different possibilities. I feel like this change makes it easier to consider the edge cases since they are more explicitly captured in code.
I really like that we didn't need to change any tests in this PR. That makes me confident in the change.
Let's give @dfed the time to review as well to make sure he's onboard before we move ahead 👍
| messages.append(try decoder.decode(T.self, from: encodedMessage)) | ||
| } | ||
| // There is only one range: | `offsetInFileOfOldestMessage` -> `offsetInFileAtEndOfNewestMessage`| | ||
| if reader.offsetInFileOfOldestMessage < reader.offsetInFileAtEndOfNewestMessage { |
There was a problem hiding this comment.
When the cache is empty I assume that reader.offsetInFileOfOldestMessage == reader.offsetInFileAtEndOfNewestMessage? I wonder if it would be best to explicitly handle the case of these two values being equal, to make this code easier to reason about.
There was a problem hiding this comment.
It would also be nice to validate my assumption that reader.offsetInFileOfOldestMessage == reader.offsetInFileAtEndOfNewestMessage when the cache is empty.
There was a problem hiding this comment.
When the cache is empty I assume that
reader.offsetInFileOfOldestMessage==reader.offsetInFileAtEndOfNewestMessage
Per this comment block
CacheAdvance/Tests/CacheAdvanceTests/CacheAdvanceTests.swift
Lines 367 to 368 in 3a1ed6f
when the reading handle and the writing handle point at the same position the file is empty. The reader starts out at header.offsetInFileOfOldestMessage, and the writer starts out at header.offsetInFileAtEndOfNewestMessage. And our reader.offsetInFileOfOldestMessage and reader.offsetInFileAtEndOfNewestMessage should always be set to the same values as those in the header, so your assumption is indeed true. I like the idea of explicitly handling this case.
There was a problem hiding this comment.
like this idea too. handle == as empty will make it more clear
| /// | ||
| /// - Parameters: | ||
| /// - file: The file URL indicating the desired location of the on-disk store. This file should already exist. | ||
| /// - maximumBytes: The maximum size of the cache, in bytes. |
| } | ||
|
|
||
| /// Returns the next encodable message, seeking to the beginning of the next message. | ||
| func nextEncodedMessage() throws -> Data? { |
There was a problem hiding this comment.
Can we make this private now?
There was a problem hiding this comment.
It could be private, let me do in next commit.
I kept it here to make the diff more readable.
| } | ||
|
|
||
| // We know the next message is at the end of the file header. Let's seek to it. | ||
| try reader.seek(to: FileHeader.expectedEndOfHeaderInFile) |
There was a problem hiding this comment.
I agree that the library is safer against unknown unknown errors without this line of code.
dfed
left a comment
There was a problem hiding this comment.
I left a ton of little nit suggestions, but overall I really dig this approach. I appreciate your spending the time to make this library easier to maintain!
I have also validated that the lines not covered by tests in CacheReader.swift are already not covered on main.
| try reader.seek(to: startOffset) | ||
| while let data = try nextEncodedMessage() { | ||
| encodedMessages.append(data) | ||
| if let endOffset = endOffset, offsetInFile >= endOffset { |
There was a problem hiding this comment.
When offsetInFile > endOffset, wouldn't that mean that the file is corrupted?
EDIT: yup! you covered this case a few lines down 😄
| } | ||
|
|
||
| /// Returns the next encodable message, seeking to the beginning of the next message. | ||
| func nextEncodedMessage() throws -> Data? { |
There was a problem hiding this comment.
I really like that this method is no longer very smart. Our code is more declarative and less imperative now, and the new approach should be simpler to maintain 😄
Co-authored-by: Dan Federman <dfed@me.com>
37f2e7e to
f623856
Compare
…nFileAtEndOfNewestMessage
| // MARK: Private | ||
|
|
||
| /// Returns the next encodable message, seeking to the beginning of the next message. | ||
| private func nextEncodedMessage() throws -> Data? { |
There was a problem hiding this comment.
Mark this func as private since it is not used by other files any more.
| } | ||
| } | ||
| } | ||
| if let endOffset = endOffset, offsetInFile != endOffset { |
There was a problem hiding this comment.
do we need this if anymore given the if within the while loop?
There was a problem hiding this comment.
This make no difference compared to the code before this. However, this looks more readable to handle == and > separately. @dfed
if offsetInFile >= endOffset {
break
}
There was a problem hiding this comment.
Ahh. This code is catching the offsetInFile < endOffset case. Maybe we could make that explicit here? Far from necessary, but it could make the intent more clear if I'm reading this right.
There was a problem hiding this comment.
Oooh, I misunderstood your question. The if within the while loop is need to check offsetInFile == endOffset to stop at the end.
There was a problem hiding this comment.
Ahh. This code is catching the
offsetInFile < endOffsetcase. Maybe we could make that explicit here? Far from necessary, but it could make the intent more clear if I'm reading this right.
I agree it would be nice to make this more clear.
| } | ||
| } else if reader.offsetInFileOfOldestMessage == reader.offsetInFileAtEndOfNewestMessage { | ||
| // This is an empty cache. | ||
| return [] |
| } | ||
| } | ||
| } | ||
| if let endOffset = endOffset, offsetInFile != endOffset { |
There was a problem hiding this comment.
Ahh. This code is catching the
offsetInFile < endOffsetcase. Maybe we could make that explicit here? Far from necessary, but it could make the intent more clear if I'm reading this right.
I agree it would be nice to make this more clear.
|
Looks like we've all set about the details, or do we still have any other comments that I missed to discuss? |
|
That's it! Thanks again @jianjunwoo for tackling this + making future maintenance of this lib easier! Your contributions have immensely improved this library in recent weeks 🙏 |
My concern is there is still this line of code:
try reader.seek(to: FileHeader.expectedEndOfHeaderInFile), which keeps the possibility to run into an infinite loop.Thus, I have a thought to fix this for
shouldOverwriteOldMessagestrue | false without the seek back action in this func.shouldOverwriteOldMessagesis true and new_offset < old_offset , split the reader by two ranges according old message offset, new message offset.1.1 Range 1 would be : from old message offset to file end.
1.2 Range 2 would be: from top to newest offset
shouldOverwriteOldMessagesis false, the reader read from top to end.@dfed @bachand